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A (ktl)-decision  two-stage  selection  procedure  is  proposed  for  the  problem 
of  comparing  k normal  means  with  a fixed  known  standard  when  the  associated 
populations  have  a common  unknown  variance.  None  or  one  of  the  populations 
is  to  be  selected,  the  procedure  having  the  property  that(i)  with  probability 
at  least  i^^Tspecif ied) , no  population  is  to  be  selected  when  the  largest 


u 


with  probability 


population  mean  is  sufficiently  less  than  the  standard,  and 
at  least  (specified),  the  population  with  the  largest  population  mean  is 

to  be  selected  when  that  mean  is  sufficiently  greater  than  its  closest  competitor 
and  the  standard.  Tables  to  implement  the  procedure  are  provided.  Applications 
and  generalizations  are  described. 


Some  key  words:  Ranking  procedures,  selection  procedures,  two-stage  procedures, 

comparisons  with  a fixed  standard,  indifference-zone  approach,  sampling  plans. 


1.  Introduction 

In  an  earlier  paper,  Bechhofer  and  Turnbull  (1974),  the  authors  proposed 
a (ktl)-decision  single-stage  selection  procedure  for  comparing  k normal 
means  with  a fixed  known  standard  when  the  k populations  have  a common  known 
variance.  In  the  present  paper  we  propose  a two-stage  procedure  for  the  same 
problem  when  the  populations  have  a common  unknown  variance.  Thus  the  present 
paper  bears  the  same  relationship  to  our  earlier  paper  as  does  Bechhofer, 
Dunnett,  and  Sobel  (1954)  to  Bechhofer  (1954).  The  reader  is  referred  to  our 
earlier  paper  for  motivation  concerning  the  practical  situations  which  lead  us 
to  consider  such  (k+l)-decision  selection  problems,  and  to  B-D-S  (1954)  and 
Dudewicz  (1971)  for  the  rationale  associated  with  adopting  a two-stage  procedure 
when  the  experimenter  is  faced  with  situations  in  which  the  common  variance  is 
unknown . 


2. Assumptions 

We  assume  that  we  have  k normal  populations  IL  with  unknown  population 

2 

means  ik  (1  < i < k)  and  a common  unknown  variance  a ; there  is  also  a 

given  known  standard  with  which  the  are  to  be  compared.  The  ranked 

values  of  the  ik  are  denoted  by  4 ==  •••  -4  y[k]’  **  *s  assumec* 

that  the  experimenter  has  no  prior  knowledge  concerning  the  pairing  of  the 

ni  with  the  (1  ,<  i,j  4 k);  it  is  further  assumed  that  the  experimenter 

has  no  prior  knowledge  concerning  how  many  or  which  (if  any)  populations  have 

y-values  which  are  > u_. 

=*  0 


3.  Goal,  probability  requirement,  and  procedure 

In  this  section  we  formulate  our  goal  and  probability  requirement  using 
the  indifference-zone  approach,  and  propose  a two-stage  procedure  which  will 
guarantee  this  probability  requirement. 


ft. 
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3.1  The  goal 

The  goal  (i.e.,  objective  of  the  experiment)  is: 

"To  select  the  population  associated  with  p^j  provided 

that  y[k]  > wo’  if  no  population  has  a p-value  which  is  (1) 

> p , then  no  population  is  to  be  selected." 

3.2  The  probability  requirement 

It  is  assumed  that  prior  to  the  start  of  experimentation,  the  experimenter 
can  specify  five  constants  {6*,<S*,6*-,P*,P*>  {0  < 6*, 5*  < ®,  -6*  < 6*  < ®; 

2*"^  < p*  < 1,  (1-2  ^)/k  < P*  < 1}  the  values  of  which  are  to  be  based  on 
economic  considerations.  The  specified  constants  along  with  the  known  standard 
are  incorporated  into  the  following  probability  requirement: 


PHy  > Pq  whenever  Pj-^  < PQ  - fij. 


Pr{JI[kj)  >.  P1  whenever 


"[k]  "o  * 6i 


“Ck]  1 “[k-1]  + S2 


where  II  (nr  ,)  denotes  the  event  of  selecting  no  population  (the  population 

U L K J 

associated  with  p^j). 


3.3  The  selection  procedure 

The  two-stage  procedure  which  we  adopt  in  this  paper,  and  which  guarantees 


(2a)  and  (2b),  is: 


.mMg^nr*"*********#**^  **».»*, **«-,-* 


"a)  In  the  first  stage  take  a common  number  NQ  > 1 

of  observations  X. . (1  < i < k,  1 < 1 < N_ ) from 
each  of  the  k populations. 

9 k Nn  Nn  9 

b)  Calculate  S' = £ f (X..  - X../N.)Vn 

i=l  j=l  13  j=l  13  U 
which  is  an  unbiased  estimate  of  a based  on 

n = k(NQ-l)  degrees  of  freedom. 

c)  Enter  the  appropriate— ^table  with  k,n  = k(NQ-l), 

and  the  specified  quantities  and 

obtain  a scalar  h and  a constant  c_  (in  the  units 
of  the  problem). 

d)  In  the  second  stage,  take  a common  number  N - NQ 
of  additional  observations  from  each  of  the  k 
populations  where  N * max{[(hS/(6* + c))2]  + 1,  NQ}, 
and  [x]  denotes  the  largest  integer  less  than  x. 

e)  Calculate  the  k over-all  (first -stage  plus  second- 

N 

stage)  sample  means  X.  = £ X.  ,/N  (1  4 i <.  k) 

1 j=l  13 

and  denote  their  ranked  values  by  X[i]  < •••  < X[k]‘ 

f)  If  Xj-kj  < uQ  + c,  select  no  population,  if 

X[k]  > + c,  select  the  population  that  produced 

X^^  as  the  one  associated  with 


Remark  1:  In  (3b)  it  is  assumed  that  the  experimenter  has  employed  a completely 

randomized  design  in  the  first  stage  of  experimentation;  in  fact,  he  must  use 


— ^For  the  special  case  in  which  the  experimenter  specifies  6*  = 0,  6*  = 6*  - 
(say),  the  design  constants  (h,c)  can  be  obtained  from  Table  1 in  Section  6 
which  is  entered  with  h and  y s c/6*. 
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that  design  rather  than  (say)  a randomized  blocks  design,  for  were  he  to  use 
the  latter  design  it  would  turn  out  that 


k 

= l 

i=l 


!°  (x.. 

j=i  i] 
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ij 


/N0  ‘ l 


i=l 


Vk 
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i=l 


Nn  9 

t XH/kN0)  /n 

j=l  13 


based  on  n = (k-l)(N0-l)  d.f.  will  underestimate  the  variance  associated 
with  the  total  experiment.  This  is  so  because  the  final  inference  (3f)  is 
made  based  on  comparisons  with  the  fixed  known  standard  rather  than  on  contrasts 
among  the  x^.  (The  situation  here  is  different  than  that  in  B-D-S  (1954).) 

The  selection  procedure  (3)  is  completely  defined  once  values  of  the  design 
constants  (h,c)  are  assigned;  as  noted  above,  these  depend  on  k,n  and 
the  specified  quantities  {6*,6J,6*;Pq,P*}.  Theorem  1,  stated  and  proved  in 
the  next  section,  tells  how  to  determine  (h,c)  so  as  to  guarantee  (2a)  and 
(2b). 

In  closing  this  section  we  emphasize  that  the  total  sample  size  N is 

2 

a random  variable  (since  S is  a random  variable)  its  distribution  depending 

not  only  on  k,n  and  {6*,6*,6*;P*,P*}  but  also  on  o^;  the  variance  of  the 

2 

distribution  of  S decreases  rapidly  with  increasing  n = k(NQ-l). 

4.  Theorems  and  proofs  of  theorems 
4.1  Statement  and  proof  of  Theorem  1 

Theorem  1:  The  (h,c)  which  guarantee  (2a)  and  (2b)  are  the  pair  which  satisfy 

the  simultaneous  equations: 
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4>k(hz)g  (z)dz 
0 n 
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ft 

0 


fff 

* 0 L J(c-6j)hz/(c+6*) 


*k-1(y  + 6*hz/(c  + 6q)  H(y)dy 


g^(z)dz  = P* 


(4a) 

(4b) 


where  $(  * ) and  <M  * ) are  the  standard  normal  distribution  and  density 

2 1/2 

function,  respectively,  and  g^( • ) is  the  density  function  of  (XR/n) 

In  many  applications  it  will  be  sufficient  to  consider  a specialization 
of  (4).  This  it:  done  in  (5a,b). 

jSf  ft  A A 

Of  special  interest  is  the  case  5Q  = 0,  6^  = 6"  = 6 (say).  Letting 

Y = c/6*  we  then  have  that  (hft,Y*),  h*  = h*(k,P*,P*),  Y*  = Y*(k ,pJ,pJ) 
are  the  pair  which  satisfy  the  simultaneous  equations: 


(Y-l)hz/Y 


«k_1(y + hz/Y)«J>(y)dy 


gn(z)dz 


(5a) 

(5b) 


and  N = max{[(hS/Yfi*)2]  + 1,  NQ}.  Tables  of  (h*,Y*)  for  k = 1(1)6,10 
and  selected  n and  {P*,P*}  are  given  in  Section  6.  An  example  of  the  use 
of  the  tables  is  given  in  Section  7. 


Proof  of  Theorem  1 

We  first  consider  (2a)  using  (3),  and  fird  that 


^"oKk]  £ w0  * 60}  = ^[k]  < Mo  + cKk]  - wo  ' 6o} 


= Pr^  - ui  < u0  - ui  + c (1  < i i k){ui  < u0  - (1  < i < 


k)) 


> Pr< 


<lll 

/?a 


V/N 


(1  < i < k)  ) . 


(6a) 


I 


We  proceed  by  first  conditioning  on  S , noting  that  conditional  on  S the 

__  2 2 

distribution  of  X^  is  N(X^|p^,  o /N),  and  unconditioning  on  S (the 

— 2 

distributions  of  X^  and  S being  independent  when  N is  fixed).  Continuing, 
we  see  that  the  r.h.s.  of  (6a)  is  no  less  than 


X - U 

Pr(  — < hQ  (1  < i < k)’ 


/s2/n 


= Pr( 


S'1'-'® 


= | »k(h02)gn(z)d2 


(6b) 


provided  that 


,60  + c, 


(7) 


We  next  consider  (2b)  using  (3),  and  find  (using  the  same  conditioning  argument 
as  above,  and  the  monotonicity  results  proved  in  B-T  (1974))  that 


= w0  + 61»  w[k]  ~ w[k-l]  + 62} 


= Pr<  X(J<)  > UQ  ♦ c>  X(i)  < X(k)  (1  < i < k-1) 


y[k)^0  + 61 
MCk]  - w[k-l]  + 62 


(k)  ' M[k3  ^ X(i)  ~ MCi]  u2 


> Pri 


(1  < i < k-1). 


/s2/n  /s2/n  /s2/n 


I 

! 


X(k)  " p[k]  > C ' 51 

/s2/n  ^ 


(8a) 
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where  X^j  denotes  the  over-all  sample  mean  based  on  N independent  observa- 
tions from  the  population  having  mean  (i  £ i £ k).  Continuing  we  see 


that  the  r.h.s.  of  (8a)  is  no  less  than 


JSsLJkl,  Vii!.h)  u < t ik-i,,  . hi 

/s2/n  /s2/n 


rS2/N 


I 

I 

I 
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■JilL  •k“1(y + h2z)^(y)dyjgn(z)dz 


(8b) 


provided  that 


N ^ max{(Sh^/(6*  - c))^,  (Sh^)2} 


(9) 


Combining  (7)  and  (9)  we  see  that  we  must  have 


N >,  S2*max{(h0/(6*  + c))Z,  (h1/( 6*  - c))*,  (h2/S*)  } 


(10) 


How  we  will  be  choosing  h^,  h^,  h2>  c so  that  the  probabilities 

(6b),  (8b)  are  equal  to  P*,  P*,  respectively.  Subject  to  these  conditions 

we  see  that  the  r.h.s.  of  (10)  is  minimized  when 


h2/(6*  + c)2  = h2/(fi*-c)2  = (h2/«2)2. 


Thus,  denoting  hQ  by  h we  find  that  if 


hx  = h(6* - c)/(6q  + c),  h2 


h6*/(6*  + c) 


(11) 


‘fnR9Mev«!mK1iPYiWft««4)»-  •• 
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-ii*  mfoMbs:-  <*&***■■*«** 


we  will  be  able  to  select  the  most  economical  sample  size  N.  Substituting 
this  minimax  choice  of  h^,  in  the  expressions  (6b),  (8b),  and  equating 
the  resulting  integrals  to  PQ,  P^ , respectively,  we  obtain  (4a),  (4b), 
respectively.  This  completes  the  proof. 

Corollary  1:  For  n »,  the  pair  h (=  h*),  c ( = c*)  which  guarantee  (2a) 

and  (2b)  are  the  pair  which  satisfy  the  simultaneous  equations: 


*k(h)  = P* 


(12a) 


(c-61)h/(c+6j) 


«k'1(y+6*h/(c  + 6j))^(y)dy  = P*. 


(12b) 


if  P*  > 2"k,  ?*  > (l-2'k)/k,  then  h*  > 0 and  -6*  < c*  < 6*. 


2 2 

Proof:  Obvious,  since  S «*>  o w.p.  1. 


Remark  2.  The  case  n -+•  * corresponds  to  the  case  when  a is  known.  When 
this  is  so,  the  probability  requirements  (2a,b)  are  clearly  satisfied  by  taking 
Mq  = N = o2(h*/(d*  ♦ c*))2,  ignoring  the  integer  condition  on  N.  With  this 
substitution  the  equations  (12a ,b)  reduce  exactly  to  the  equations  (4a, b)  of 
Bechhofer  and  Turnbull  [1974]. 


4.2  Statement  and  proof  of  Theorem  2 


^ ^ ^ 

Theorem  2:  The  same  (h,c)  which  guarantee  (2a, b)  when  6^^  = 62  = 6 (say), 

also  guarantee 


Pr(nck]  or  Vn  or  ••• or  "tk-t.i]1 
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whenever 


a) 


or 


b) 


utk]  i “o  + **•  "[k-t.l]  i **0  - “[k-t] 


k[kj  > ko  ♦ “[k-ttl]  i^Ck-t]  + **• 


for  any  t (1  < t < k),  the  probability  being  strict  if  t > 1;  here  the 
event  (1  < i < k)  corresponds  to  selecting  the  population  associated  with 

Uj-^-j  , and  we  define  Uj-0^  = 


i 


Proof  of  Theorem  2 

The  proof  follows  the  same  lines  as  those  given  for  Theorem  2 in  B-T  (1974), 
and  thus  is  not  given  here. 
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5.  The  expected  sample  size  and  the  choice  of  NQ 
Following  Stein  (1945),  we  have 

E(N)  = NftPr(x2  < nN.z2/o2} 
on  o 

+ (o2/Z2)Pr(x^+2  > nNQz2/o2}  + 0-Pr{x2  > nNQz2/o2}  (13) 

where  z = (6j  + c)/h  and  0 4 0 < 1.  A similar  expression  can  be  derived  for 
2 

E{N  }.  (An  expression  analogous  to  (13)  has  been  tabulated  in  Tables  I-IV  of 
Seelbinder  (1953),  his  c = d/o,  t,  nQ  correspond  to  our  (5*  + c)/o,  h,  k(NQ-l), 
respectively;  however,  his  tables  are  applicable  here  only  when  k = 1.) 

If  the  experimenter  has  some  idea  as  to  the  possible  values  of  0,  this 
information  can  be  used  to  assist  in  the  choice  of  NQ.  For  example,  suppose 
that  it  can  be  reasonably  assumed  that  0 lies  in  the  range  [o^,o2]»  Using 
the  minimax  regret  criterion  of  Seelbinder  (1953),  NQ  would  be  chosen  to 


f-% 


10 


minimize  the  maximum  expected  loss  in  number  of  extra  observations  needed  due 
to  ignorance  of  o,  i.e.,  NQ  minimizes 

max  (E{N)  - (ohVtfi*  ♦ c*))2),  (14) 

a tfc 

where  h , c are  the  solutions  of  (12a,b).  In  (14),  E{N}  of  course  depends 

00  OD 

on  Nq  and  o,  while  by  Remark  2 in  Section  4.1  the  second  term  represents 
the  total  sample  size  (within  unity)  if  a were  known.  For  the  special  case 
6*  = 0,  6*  = 6*  = 5*,  the  value  of  h*  and  c*  (=  y*6K)  can  be  obtained 
from  the  bottom  line  of  the  appropriate  column  in  the  tables  of  Section  6,  or, 
alternatively,  from  the  tables  in  Bechhofer  and  Turnbull  (1974).  (There, 
quantities  A?,  A*  are  tabulated*  these  are  related  to  h*,  y*  by  h*  = A?, 
y*  = A*/A*. ) 

It  is  clear  that  the  variance  of  N increases  as  NQ  decreases.  Hence, 
as  an  alternative  to  the  above  procedure  for  choosing  NQ,  we  may  wish  to 
increase  (perhaps  by  only  a small  number  of  observations  if  k is  large) 

in  order  to  reduce  the  probability  of  being  required  to  take  an  extremely  large 
N;  this  gain  is  purchased  at  the  cost  of  a slight  increase  in  E{N).  Moshman 
(1958)  has  discussed  such  criteria  in  a similar  two-stage  problem,  and  his 
methods  are  applicable  here. 

Finally,  it  may  be  that  there  is  a specified  upper  bound  N*  on  the  total 
sample  size  N.  If  the  procedure  calls  for  more  than  H*  observations,  then 
N*  observations  are  taken,  and  the  probabilities  P*,  P*  are  reduced.  Such 
a procedure  can  be  legitimately  constructed  by  using  methods  similar  to  those 
of  Wormleighton  (1960). 


11 


6 . Tables 

Table  1 gives  the  solution  (h*,Y*)  of  equations  (5a, b)  for  k = 1(1)6,10 
and  selected  n and  {P*,P*}.  The  tabulated  values  of  (h*,y*)  are  calculated 
to  an  accuracy  of  ±10  4 in  the  associated  (P*,P*)-values. 

7.  A numerical  example  of  the  use  of  the  tables 

A consumer  is  to  decide  whether  or  not  to  purchase  one  lot  of  bolts  from 

among  three  lots  which  are  being  offered  for  his  consideration.  The  tensile 

strength  (X)  of  the  bolts  in  the  ith  lot  (1  <.  i 4 3)  may  be  assumed  to  be 

2 

normally  distributed  with  unknown  variance  0 and  unknown  mean  u^-  A lot 
is  deemed  acceptable  only  if  the  bolts  in  the  lot  have  a mean  tensile  strength 
of  at  least  60,000  psi.  Suppose  that  he  asks  for  a procedure  which  will 
guarantee  P*  =0.95  and  P*  = 0.90  with  fij  = 0,  6*  = 6*  = 250  psi  and 
Uq  = 60,000  psi.  How  should  we  proceed  to  guarantee  his  requirement? 

The  choice  of  IJQ  ^2  is  optional.  Suppose  that  we  take  a preliminary 
sample  of  IIQ  = 16  observations  (bolts)  from  each  lot,  and  using  (3b)  we 
obtain  the  estimate  S = (753.1  psi)  of  0 based  on  45  d.f.  Entering 
Table  1 with  k = 3,  n = 45,  pj  = 0.95,  P*  = 0.90  we  find  that  h*  = 2.1855 
and  y*  = 0.6234.  Hence,  c = (0.6234)(250)  = 156  psi,  and  from  (3d)  we  see 
that  N = max{[{(2.1855)(753.1)/(0.6234)(250)}2]  + 1,  16)  = 112.  Therefore 
96  additional  observations  (bolts)  are  taken  from  each  lot,  and  the  three 
sample  means  (based  on  all  112  observations  per  lot)  are  computed.  If  all 
of  these  sample  means  are  less  than  + c = 60,156  psi,  then  no  lot  is 
accepted;  otherwise  the  lot  that  produced  the  largest  sample  mean  is  accepted. 

8.  Extensions  and  directions  of  future  research 

It  is  straightforward  to  extend  the  results  in  this  paper  to  the  case  of 
known  variance  ratios,  i.e.,  when  it  is  assumed  that  the  variance  of  population 
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I n.  (1  < i < k)  is  o.  = a.o  where  the  a.  are  known  and  o is  unknown; 
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j the  methods  would  be  analogous  to  that  used  in  B-D-S  (1954). 

It  would  be  desirable  to  devise  two-stage  (or  multi-stage)  procedures  when 
the  a^  are  also  unknown.  Presumably  the  methods  of  Dudewicz  and  Dalai  (1971) 
or  of  Rinott  (1974)  could  be  extended  to  deal  with  this  problem. 
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»B  SUPPl.EMCN  TARY  NOTE5 


I *9-  KLY  ^000$  (Continue  on  rtvort*  aldo  If  noca  iaary  and  Identity  by  block  numbar) 


Ranking  procedures,  selection  procedures,  two-stage  procedures, 
comparisons  with  a fixed  standard,  indifference-zone  approach, 
sampling  plans 

kO.  AOSTA  ACT  (Contlnua  on  ravaraa  «/ d«  If  nacaaaary  and  Identify  by  bto^k  mtmbar) 

A (k+1  )-decision  two-stage  selection  procedure  is  proposed  for  the  problem 
of  comparing  k normal  means  with  a fixed  known  standard  when  the  associated 
populations  have  a common  unknown  variance.  None  or  one  of  the  populations 
is  to  be  selected,  the  procedure  having  the  property  that  i)  with  probability 
at  least  P*  (specified),  no  population  is  to  be  selected  when  the  largest 
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population  mean  in  sufficiently  less  than  the  standard,  and  ii)  with 
probability  at  least  P”  (specified),  the  population  with  th<’  largest 
population  mean  in  to  be  selected  when  that  mean  is  sufficiently  , neater 
than  its  closest  competitor  and  the  standard.  Tables  to  implement  Un- 
procedure  are  provided.  Applications  and  generalizations  are  described. 
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